Last updated: 2025-05-28

Checks: 6 1

Knit directory: /mnt/central_nas/projects/type1_diabetes/nathan/BlockCourse/2024_BlockCourse/T1D_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 980aa9d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    T1D_analysis/20250528_CD8a_INS_density_Immune.png
    Ignored:    T1D_analysis/20250528_CD8a_INS_density_Islet.png
    Ignored:    T1D_analysis/figure/
    Ignored:    processing/

Untracked files:
    Untracked:  T1D_analysis/05_CellCategories_cells_Compressed.html

Unstaged changes:
    Modified:   T1D_analysis/01_ImportData_cells_Compressed.html
    Modified:   T1D_analysis/02_SpilloverCompensation_cells_Compressed.html
    Modified:   T1D_analysis/03_TransformCorrect_cells_Compressed.html
    Modified:   T1D_analysis/04_QualityControl_cells_Compressed.html
    Modified:   T1D_analysis/05_CellCategories_cells_Compressed.Rmd
    Modified:   T1D_analysis/05_CellCategories_cells_Uncompressed.Rmd
    Modified:   T1D_analysis/06_CellTypes_cells_Compressed.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (T1D_analysis/06_CellTypes_cells_Compressed.Rmd) and HTML (T1D_analysis/06_CellTypes_cells_Compressed.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 955a62a nathansteenbuck 2025-05-28 full update 2025 annotation
Rmd 3c04956 nathansteenbuck 2024-11-21 final script added
Rmd e21a12b nathansteenbuck 2024-11-20 edits scripts
Rmd f9ec97c nathansteenbuck 2024-11-20 adjust descriptions
Rmd 43dff72 nathansteenbuck 2024-11-19 update analysis_pipeline
Rmd 08f607f nathansteenbuck 2024-11-19 update channel decomposition + installation
Rmd abf8b5b nathansteenbuck 2024-10-23 backbone_3

Rscript -e “rmarkdown::render(‘2024_BlockCourse/T1D_analysis/06_CellTypes_cells_Compressed.Rmd’)”

Goal

In the following scripts, cell types are attributed to all cells in the dataset in an iterative way:

In both scripts this is performed by performing PhenoGraph clustering using the Rphenoannoy package.

The resulting cell categories, which are used in downstream analyses are stored as colData(spe)$cell_category, and the cell types are stored as colData(spe)$cell_type.

Settings

Load packages

suppressPackageStartupMessages(c(
  library(data.table),
  library(dplyr),
  library(SpatialExperiment),
  library(parallel),
  library(tictoc),
  library(purrr),
  library(furrr)
))

Paths and settings

# Paths
if (!dir.exists(paths$folder_script)) dir.create(paths$folder_script)
plotsave_param$path <- paths$folder_script
plotsave_param_large$path <- paths$folder_script

# Misc settings
today <- gsub("-", "", Sys.Date())

Read in the data

Load the SpatialExperiment (SPE) object saved at the previous step.

fn_spe <- file.path(paths$folder_out, paste0(paths$object_type, "_", paths$panel_type, ".rds"))
spe <- readRDS(fn_spe)
print(spe)
class: SpatialExperiment 
dim: 27 139629 
metadata(2): spillover_matrix subset
assays(6): counts compcounts ... scaled fastMNN_case_id
rownames(27): H3 CD44_GCG ... DNA3 PPY
rowData names(10): channel metal ... shortname channel_name
colnames(139629): 6238_Compressed_001_1 6238_Compressed_001_2 ...
  6396_Compressed_030_1445 6396_Compressed_030_1446
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(1): sample_id

Clustering

Settings

Load packages

suppressPackageStartupMessages(c(
  library(ggplot2),
  library(scater),
  library(scuttle),
  library(scran),
  library(igraph),
  library(Rphenoannoy),
  library(clustree),
  library(ranger),
  library(patchwork),
  library(BiocParallel),
  library(tictoc)
))

Channels and assays

Select channels (channels_clust), assays (assay_sel) and clustering methods (methods_sel) to use.

methods_sel <- c("Pheno1")

assay_sel <- c("scaled")
names(assay_sel) <- c("scaled")

writeLines(c("Assays:", assay_sel[assay_sel %in% assayNames(spe)], 
                        assay_sel[assay_sel %in% reducedDimNames(spe)]))
Assays:
scaled
dimred_sel <- c("UMAP")
writeLines(c("\nReduced dimensions:", dimred_sel))

Reduced dimensions:
UMAP
channels <- rownames(spe)[!(grepl("DNA|H3", rownames(spe)))]
cat(c("\nChannels:", channels[channels %in% rownames(spe)]))

Channels: CD44_GCG GLUT1 CD99 CD68 MPO SMA CD20_SST AMY CD3e_NKX6_1 CK19 PDX1 SYP CD45RO FOXP3 CD45RA CD8a_INS CA9 IAPP CD4_ProINS CD31 Ecdh PTPRN PCSK2 PPY
cat(c("\nNumber of channels:", length(channels)))

Number of channels: 24

Select channels to use for clustering

Reduced dimension plots showing marker expression that generated by the previous script can be used to select the most relevant markers for clustering.

channels_clust <- rownames(rowData(spe)[rowData(spe)$clustering == 1, ])
cat(c("\nChannels used for unsupervised clustering:",
      channels_clust[channels_clust %in% rownames(spe)]))

Channels used for unsupervised clustering: CD44_GCG GLUT1 CD99 CD68 MPO SMA CD20_SST AMY CD3e_NKX6_1 CK19 PDX1 SYP CD45RO FOXP3 CD45RA CD8a_INS CA9 IAPP CD4_ProINS CD31 Ecdh PTPRN PCSK2 PPY

Subset

Now, we create new SCE objects for each cell category

sce_isl <- spe[, colData(spe)$cell_category == "Islet"]
sce_immune <- spe[, colData(spe)$cell_category == "Immune"]
sce_exo <- spe[, colData(spe)$cell_category == "Exocrine"]
sce_stroma <- spe[, colData(spe)$cell_category == "Stroma"]
sce_other <- spe[, colData(spe)$cell_category == "Other"]

Channel Decomposition

We will now decompose the channels into their respective signals and noise components. We have the following assumptions:

Assuming S(CD20) and S(SST) represent the true signals and N(CD20) and N(SST) represent the noise components, we can now write the following equations:

Perform Channel Decomposition

channels_compressed <- c("CD44_GCG", "CD8a_INS", "CD4_ProINS", "CD3e_NKX6_1", "CD20_SST")
channels_uncompressed <- c("CD44", "GCG", "CD8a", "INS", "CD4", "ProINS", "CD3e", "NKX6_1", "CD20", "SST")

Fit 2-component GMMs with unequal variance to each channel in each cell category to estimate the means(“M_”) and variances (“V_”) of the signal (“S_”) and noise (“N_”) components.

In addition, extract the threshold value that separates the signal from the noise.

coefficients <- list()
coefficients <- map(channels_compressed, \(channel) {
  # Immune: 
  S_Immune_N_Islet <- mclust::densityMclust(data = assay(sce_immune, "exprs")[channel, ], G = 2, modelNames = "V")
  M_S_Immune_N_Islet <- S_Immune_N_Islet$parameters$mean[2] 
  V_S_Immune_N_Islet <- S_Immune_N_Islet$parameters$variance$sigmasq[2]
  # Extract maximal value for the noise -> threshold value.
  threshold_Immune <- max(S_Immune_N_Islet$data[S_Immune_N_Islet$classification == 1,])
  
  # Islet:
  N_Immune_S_Islet <- mclust::densityMclust(data = assay(sce_isl, "exprs")[channel, ], G = 2, modelNames = "V")
  M_N_Immune_S_Islet <- N_Immune_S_Islet$parameters$mean[2]
  V_N_Immune_S_Islet <- N_Immune_S_Islet$parameters$variance$sigmasq[2]
  threshold_Islet <- max(N_Immune_S_Islet$data[N_Immune_S_Islet$classification == 1,])

  # Exocrine:
  N_Immune_N_Islet <- mclust::densityMclust(data = assay(sce_exo, "exprs")[channel, ], G = 2, modelNames = "V")
  M_N_Immune_N_Islet <- N_Immune_N_Islet$parameters$mean[1]
  V_N_Immune_N_Islet <- N_Immune_N_Islet$parameters$variance$sigmasq[1]
  
  # Stroma:
  I_Stroma <- mclust::densityMclust(data = assay(sce_stroma, "exprs")[channel, ], G = 2, modelNames = "V")$parameters$mean[1]

  coefficients <- c(M_S_Immune_N_Islet, V_S_Immune_N_Islet, 
                    M_N_Immune_S_Islet, V_N_Immune_S_Islet, 
                    M_N_Immune_N_Islet, V_N_Immune_N_Islet,
                    threshold_Islet, threshold_Immune)

  names(coefficients) <- c("M_S_Immune_N_Islet", "V_S_Immune_N_Islet", 
                           "M_N_Immune_S_Islet", "V_N_Immune_S_Islet", 
                           "M_N_Immune_N_Islet", "V_N_Immune_N_Islet",
                           "threshold_Islet", "threshold_Immune")
  coefficients
}) |> 
  bind_rows() |> 
  mutate(channel = channels_compressed)

Inspect the coefficients.

coefficients
# A tibble: 5 × 9
  M_S_Immune_N_Islet V_S_Immune_N_Islet M_N_Immune_S_Islet V_N_Immune_S_Islet
               <dbl>              <dbl>              <dbl>              <dbl>
1              0.362             0.126               1.98              0.778 
2              0.539             0.186               1.36              0.566 
3              0.370             0.0507              2.04              1.51  
4              0.179             0.0101              0.336             0.0218
5              0.402             0.382               2.84              2.34  
# ℹ 5 more variables: M_N_Immune_N_Islet <dbl>, V_N_Immune_N_Islet <dbl>,
#   threshold_Islet <dbl>, threshold_Immune <dbl>, channel <chr>

Show CD8a_INS Distribution for Islet Cells:

-> INS signal and CD8a noise.

N_Immune_S_Islet <- mclust::densityMclust(data = assay(sce_isl, "exprs")["CD8a_INS", ], G = 2, modelNames = "V")

p <- tibble(counts = assay(sce_isl, "exprs")["CD8a_INS", ]) |> 
  ggplot(aes(x = counts)) + geom_density() + 
  labs(x = "arcinsh(counts): CD8a_INS",
       y = "Density") + 
  mytheme$standard_new() + 
  geom_vline(xintercept = N_Immune_S_Islet$parameters$mean[2], color = "blue") + 
  geom_vline(xintercept = max(N_Immune_S_Islet$data[N_Immune_S_Islet$classification == 1,]), color = "orange")
print(p)

fn <- paste0(today, "_CD8a_INS_density_Islet.png")
ggsave(fn, p, width = 10, height = 10)

Show CD8a_INS Distribution for Immune Cells: -> CD8a signal and INS noise.

N_Islet_N_Immune <- mclust::densityMclust(data = assay(sce_immune, "exprs")["CD8a_INS", ], G = 2, modelNames = "V")

p <- tibble(counts = assay(sce_immune, "exprs")["CD8a_INS", ]) |> 
  ggplot(aes(x = counts)) + geom_density() + 
  labs(x = "arcinsh(counts): CD8a_INS",
       y = "Density") + 
  mytheme$standard_new() + 
  geom_vline(xintercept = N_Islet_N_Immune$parameters$mean[2], color = "blue") + 
  geom_vline(xintercept = max(N_Islet_N_Immune$data[N_Islet_N_Immune$classification == 1,]), color = "orange")
print(p)

fn <- paste0(today, "_CD8a_INS_density_Immune.png")
ggsave(fn, p, width = 10, height = 10)

Show CD8a_INS Distribution for Exocrine Cells:

N_Islet_N_Immune <- mclust::densityMclust(data = assay(sce_exo, "exprs")["CD8a_INS", ], G = 2, modelNames = "V")

p <- tibble(counts = assay(sce_exo, "exprs")["CD8a_INS", ]) |> 
  ggplot(aes(x = counts)) + geom_density() + 
  labs(x = "arcinsh(counts): CD8a_INS",
       y = "Density") + 
  mytheme$standard_new() + 
  geom_vline(xintercept = N_Islet_N_Immune$parameters$mean[1], color = "blue") + 
  geom_vline(xintercept = max(N_Islet_N_Immune$data[N_Islet_N_Immune$classification == 1,]), color = "orange")
print(p)

fn <- paste0(today, "_CD8a_INS_density_Immune.png")
ggsave(fn, p, width = 10, height = 10)

Estimate coefficients of normal distributions (Noise + Signal)

Now, we will decompose the channels into their respective signals and noise components. Here, we estimate the means and variances of the signal and noise components for each channel in each cell category.

estimated_coeffs <- purrr::pmap(coefficients |> dplyr::select(M_S_Immune_N_Islet,
    V_S_Immune_N_Islet,
    M_N_Immune_S_Islet,
    V_N_Immune_S_Islet,
    M_N_Immune_N_Islet,
    V_N_Immune_N_Islet), 
      function(M_S_Immune_N_Islet, V_S_Immune_N_Islet, M_N_Immune_S_Islet, V_N_Immune_S_Islet,
               M_N_Immune_N_Islet, V_N_Immune_N_Islet) {
    
    # As additional constraints we introduce scaling factors between the signal + noise components.
    # Scaling factor between Islet signal (e.g. SST) and Immune signal (e.g. CD20) (estimated from the data)
    fraction_mu <- (M_N_Immune_S_Islet - M_N_Immune_N_Islet/2) / (M_S_Immune_N_Islet- M_N_Immune_N_Islet/2)
    # Scaling factor between sigma SST signal and sigma CD20 signal (estimated from the data)
    # -> This is a strong assumption and might not be true in reality.
    fraction_sigma <- (V_S_Immune_N_Islet - V_N_Immune_N_Islet/2)/(V_N_Immune_S_Islet - V_N_Immune_N_Islet/2)
    # -> we assume that the same fractions apply for the noise. 

    message("Fraction mu: ", fraction_mu, " Fraction sigma: ", fraction_sigma)
    message("1: ", M_S_Immune_N_Islet, 
            "2: ", V_S_Immune_N_Islet, 
            "3: ", M_N_Immune_S_Islet, 
            "4: ", V_N_Immune_S_Islet,
            "5: ", M_N_Immune_N_Islet,
            "6: ", V_N_Immune_N_Islet)

    # Augmented coefficient matrix
    A <- rbind(
      c(1, 0, 0, 1, 0, 0, 0, 0),  # M_S_Immune_N_Islet = mu_S_CD20 + mu_N_SST
      c(0, 1, 1, 0, 0, 0, 0, 0),  # M_N_Immune_S_Islet = mu_N_CD20 + mu_S_SST
      c(0, 0, 1, 1, 0, 0, 0, 0),  # M_N_Islet_N_Immune = mu_N_SST + mu_N_CD20
      c(0, 0, 0, 0, 1, 0, 0, 1),  # V_S_Immune_N_Islet = sigma2_S_CD20 + sigma2_N_SST
      c(0, 0, 0, 0, 0, 1, 1, 0),  # V_N_Immune_S_Islet = sigma2_N_CD20 + sigma2_S_SST
      c(0, 0, 0, 0, 0, 0, 1, 1),   # V_N_Immune_N_Islet = sigma2_N_SST + sigma2_N_CD20
      c(fraction_mu, -1, 0, 0, 0, 0, 0, 0),  # fraction_mu * mu_S_CD20 - S_SST = 0
      c(0,0,fraction_mu,-1, 0, 0, 0, 0)  # fraction_mu * mu_N_CD20 - mu_N_SST = 0
    )
    message("A: ", A)

    # Right-hand side (means and variances combined)
    B <- c(
      M_S_Immune_N_Islet, M_N_Immune_S_Islet, M_N_Immune_N_Islet,
      V_S_Immune_N_Islet, V_N_Immune_S_Islet, V_N_Immune_N_Islet, 0,0
    )

    message("B: ", B)

    # Solve by non-negative least squares. Counts should not be negative here.
    fit_nnls <- nnls::nnls(A, B)   # Solve the joint system
    coefs <- coef(fit_nnls)

    names(coefs) <- c("mu_S_Immune", "mu_S_Islet", "mu_N_Immune", "mu_N_Islet", 
                      "sigma2_S_Immune", "sigma2_S_Islet", "sigma2_N_Immune", "sigma2_N_Islet")
    coefs
}) |> dplyr::bind_rows() |> dplyr::mutate(channel = channels_compressed) 
Fraction mu: 5.68701986522426 Fraction sigma: 0.161916046350452
1: 0.3620610033206792: 0.1262384154254513: 1.983029031856694: 0.7782794776979875: 0.03243813281657716: 0.000530923996168627
A: 1000005.687019865224260010000-1001100005.687019865224261010000-100010000000010000000110000010100
B: 0.3620610033206791.983029031856690.03243813281657710.1262384154254510.7782794776979870.00053092399616862700
Fraction mu: 2.62194622232602 Fraction sigma: 0.328719998370566
1: 0.538933147934292: 0.1861459996342543: 1.362740461793784: 0.5656324971052435: 0.06204061370993216: 0.000629502165911279
A: 1000002.621946222326020010000-1001100002.621946222326021010000-100010000000010000000110000010100
B: 0.538933147934291.362740461793780.06204061370993210.1861459996342540.5656324971052430.00062950216591127900
Fraction mu: 6.57375878571027 Fraction sigma: 0.0327536893663039
1: 0.369772425233842: 0.05065069143029433: 2.041540221581654: 1.508353148110645: 0.1396739695699246: 0.00257754606802299
A: 1000006.573758785710270010000-1001100006.573758785710271010000-100010000000010000000110000010100
B: 0.369772425233842.041540221581650.1396739695699240.05065069143029431.508353148110640.0025775460680229900
Fraction mu: 2.12299935886517 Fraction sigma: 0.451489332943297
1: 0.1785272981009582: 0.01005230764545023: 0.3355035028517154: 0.02182202433564265: 0.07748862225623516: 0.000728869816702751
A: 1000002.122999358865170010000-1001100002.122999358865171010000-100010000000010000000110000010100
B: 0.1785272981009580.3355035028517150.07748862225623510.01005230764545020.02182202433564260.00072886981670275100
Fraction mu: 7.2934056547457 Fraction sigma: 0.163211052076641
1: 0.4015616715471732: 0.3823908631692243: 2.83700691148874: 2.34233577226585: 0.02915599582664976: 0.000228916649075631
A: 1000007.29340565474570010000-1001100007.29340565474571010000-100010000000010000000110000010100
B: 0.4015616715471732.83700691148870.02915599582664970.3823908631692242.34233577226580.00022891664907563100

Inspect the estimated coefficients. Please note: This doesnt work perfect yet, and is the first attempt at algorithmic decomposition.

estimated_coeffs
# A tibble: 5 × 9
  mu_S_Immune mu_S_Islet mu_N_Immune mu_N_Islet sigma2_S_Immune sigma2_S_Islet
        <dbl>      <dbl>       <dbl>      <dbl>           <dbl>          <dbl>
1       0.348      1.98      0.00412     0.0221          0.126          0.778 
2       0.510      1.34      0.0164      0.0391          0.186          0.566 
3       0.307      2.02      0.0154      0.0962          0.0481         1.51  
4       0.142      0.306     0.0246      0.0473          0.0101         0.0211
5       0.388      2.83      0.00290     0.0202          0.382          2.34  
# ℹ 3 more variables: sigma2_N_Immune <dbl>, sigma2_N_Islet <dbl>,
#   channel <chr>
joint_coeffs <- estimated_coeffs |> 
  dplyr::left_join(coefficients, by = "channel") |> 
  dplyr::select(-dplyr::starts_with("M_"), -dplyr::starts_with("V_"))

full_coeffs <- joint_coeffs |> 
  mutate(sigma2 = sigma2_N_Immune + sigma2_N_Islet) |> 
  mutate(sigma2_N_Immune = sigma2/2) |>
  mutate(sigma2_N_Islet = sigma2/2) |> 
  select(-sigma2)
full_coeffs
# A tibble: 5 × 11
  mu_S_Immune mu_S_Islet mu_N_Immune mu_N_Islet sigma2_S_Immune sigma2_S_Islet
        <dbl>      <dbl>       <dbl>      <dbl>           <dbl>          <dbl>
1       0.348      1.98      0.00412     0.0221          0.126          0.778 
2       0.510      1.34      0.0164      0.0391          0.186          0.566 
3       0.307      2.02      0.0154      0.0962          0.0481         1.51  
4       0.142      0.306     0.0246      0.0473          0.0101         0.0211
5       0.388      2.83      0.00290     0.0202          0.382          2.34  
# ℹ 5 more variables: sigma2_N_Immune <dbl>, sigma2_N_Islet <dbl>,
#   channel <chr>, threshold_Islet <dbl>, threshold_Immune <dbl>

Estimate full count density

After estimating the coefficients, we can now estimate the actual counts for each cell.

These are helper functions to estimate: - The most likely S_Islet and N_Immune for a given positive Islet cell. - The most likely N_Islet and S_Immune for a given positive Immune cell. - The most likely N_Islet and N_Immune in case of noise only.

# Function to calculate the most likely S_Islet and N_Immune
likelihood_s_islet_n_immune <- function(N_Immune, Immune_Islet_obs, mu_Islet, sigma_Islet, mu_N_Immune, sigma_N_Immune) {
  S_Islet <- Immune_Islet_obs - N_Immune  # Compute N_SST from the constraint
  if (N_Immune < 0) return(0)  # Noise cannot be negative
  P_S_Islet <- dnorm(S_Islet, mean = mu_Islet, sd = sigma_Islet)
  P_N_Immune <- dnorm(N_Immune, mean = mu_N_Immune, sd = sigma_N_Immune)
  return(P_S_Islet * P_N_Immune)
}

# Function to calculate the most likely N_Islet and S_Immune
likelihood_n_islet_s_immune <- function(N_Islet, Immune_Islet_obs, mu_Immune, sigma_Immune, mu_N_Islet, sigma_N_Islet) {
  S_Immune <- Immune_Islet_obs - N_Islet  # Compute N_SST from the constraint
  if (N_Islet < 0) return(0)  # Noise cannot be negative
  P_S_Immune <- dnorm(S_Immune, mean = mu_Immune, sd = sigma_Immune)
  P_N_Islet <- dnorm(N_Islet, mean = mu_N_Islet, sd = sigma_N_Islet)
  return(P_S_Immune * P_N_Islet)
}

# Function to calculate the most likely N_Islet and N_Immune in case of noise only.
likelihood_n_islet_n_immune <- function(N_Immune, Immune_Islet_obs, mu_N_Immune, sigma_N_Immune, mu_N_Islet, sigma_N_Islet) {
  N_Islet <- Immune_Islet_obs - N_Immune  # Compute N_SST from the constraint
  if (N_Immune < 0) return(0)  # Noise cannot be negative
  P_N_Immune <- dnorm(N_Immune, mean = mu_N_Immune, sd = sigma_N_Immune)
  P_N_Islet <- dnorm(N_Islet, mean = mu_N_Islet, sd = sigma_N_Islet)
  return(P_N_Immune * P_N_Islet)
}

# Function to calculate the most likely N_Islet and N_Immune in case of noise only.
likelihood_n_islet_n_immune_2 <- function(N_Islet, Immune_Islet_obs, mu_N_Immune, sigma_N_Immune, mu_N_Islet, sigma_N_Islet) {
  N_Immune <- Immune_Islet_obs - N_Islet  # Compute N_SST from the constraint
  if (N_Islet < 0) return(0)  # Noise cannot be negative
  P_N_Immune <- dnorm(N_Immune, mean = mu_N_Immune, sd = sigma_N_Immune)
  P_N_Islet <- dnorm(N_Islet, mean = mu_N_Islet, sd = sigma_N_Islet)
  return(P_N_Immune * P_N_Islet)
}

Decompose for Islet Cells.

# IF cell_category is Islet and channel is CD20_SST:
  # Above threshold: signal intensity is SST signal + CD20 noise
  # Below threshold: signal intensity is SST noise  + CD20 noise
library(furrr)
set.seed(222)
options <- furrr::furrr_options(seed = seed)

# Islet cells.
results <- 
  purrr::pmap(full_coeffs, \(mu_S_Immune, mu_S_Islet, mu_N_Immune, mu_N_Islet, sigma2_S_Immune, sigma2_S_Islet, sigma2_N_Immune, sigma2_N_Islet, channel, threshold_Islet, threshold_Immune) {
    message(channel)
    furrr::future_map(1:length(colnames(sce_isl)), \(i) {
      # Get compressed value per cell.
      Immune_Islet_obs <- assay(sce_isl, "exprs")[channel, ][i]
      
      # Define optimization bounds. between [0, threshold_Immune-Noise]
      lower_bound1 <- ifelse(mu_N_Immune - sigma2_N_Immune < 0, 0, mu_N_Immune - sigma2_N_Immune)
      upper_bound1 <- ifelse(mu_N_Immune + sigma2_N_Immune > threshold_Immune, threshold_Immune, mu_N_Immune + sigma2_N_Immune)
      interval <- c(lower_bound1, upper_bound1)

      # Catch edge-case.  
      if(Immune_Islet_obs == 0) {
        tibble(Islet = 0, Immune = 0)
      } 
      # Above threshold: signal intensity is SST signal + CD20 noise
      else if(Immune_Islet_obs > threshold_Islet){
          result <- optimize(
            f = likelihood_s_islet_n_immune,
            interval = interval, 
            maximum = TRUE,
            Immune_Islet_obs = Immune_Islet_obs,
            mu_Islet = mu_S_Islet,
            sigma_Islet = sigma2_S_Islet,
            mu_N_Immune = mu_N_Immune,
            sigma_N_Immune = sigma2_N_Immune
          )
          # Extract the results
          N_Immune_opt <- result$maximum
          S_Islet_opt <- Immune_Islet_obs - N_Immune_opt
          tibble(Immune = N_Immune_opt, Islet = S_Islet_opt)
      }
      # Below threshold: signal intensity is SST noise  + CD20 noise
      else{
        result <- optimize(
          f = likelihood_n_islet_n_immune,
          interval = interval,  # N_Immune cannot exceed the N_Immune + Sigma_Immune
          maximum = TRUE,
          Immune_Islet_obs = Immune_Islet_obs,
          mu_N_Immune = mu_N_Immune,
          sigma_N_Immune = sigma2_N_Immune,
          mu_N_Islet = mu_N_Islet,
          sigma_N_Islet = sigma2_N_Islet
        )
        # Extract the results
        N_Immune_opt <- result$maximum
        N_Islet_opt <- Immune_Islet_obs - N_Immune_opt
        tibble(Immune = N_Immune_opt, Islet = N_Islet_opt)
      }
    }, .options = options) |> bind_rows()
})
CD44_GCG
CD8a_INS
CD4_ProINS
CD3e_NKX6_1
CD20_SST
results_optim <- results |> bind_cols()
New names:
• `Immune` -> `Immune...1`
• `Islet` -> `Islet...2`
• `Immune` -> `Immune...3`
• `Islet` -> `Islet...4`
• `Immune` -> `Immune...5`
• `Islet` -> `Islet...6`
• `Immune` -> `Immune...7`
• `Islet` -> `Islet...8`
• `Islet` -> `Islet...9`
• `Immune` -> `Immune...10`
colnames(results_optim) <- channels_uncompressed
results_optim
# A tibble: 20,999 × 10
      CD44    GCG   CD8a      INS    CD4 ProINS   CD3e   NKX6_1   CD20     SST
     <dbl>  <dbl>  <dbl>    <dbl>  <dbl>  <dbl>  <dbl>    <dbl>  <dbl>   <dbl>
 1 0.00431 0.0309 0.0167  0.0559  0.0142 0.0580 0.0249  0.0295  0      0      
 2 0.00393 0.0163 0.0167 -0.00432 0.0166 0.0176 0.0249 -0.00364 0      0      
 3 0.00410 2.41   0.0164  0.171   0.0154 0.424  0.0249  0.00666 0.0407 0.00297
 4 0.00410 2.84   0.0164  0.249   0.0154 0.514  0.0249  0.0520  0.0861 0.00297
 5 0.00410 2.71   0.0164  0.485   0.0154 1.73   0.0249  0.120   0.0890 0.00297
 6 0.00414 1.19   0.0167  0.0515  0.0166 0.228  0.0243  0.0354  0.0440 0.00297
 7 0.00410 2.14   0.0164  2.25    0.0154 1.27   0.0249  0.179   0.119  0.00297
 8 0.00431 0.235  0.0164  0.845   0.0154 2.15   0.0247  0.690   0.0785 0.00297
 9 0.00414 1.82   0.0167  0.0909  0.0166 0.276  0.0249  0.00599 0.0215 0.00297
10 0.00431 0.273  0.0164  2.38    0.0154 2.40   0.0246  0.294   0.116  0.00297
# ℹ 20,989 more rows

Check the results.

channel_oi <- "CD8a_INS"
# Split channels_oi into Immune and Islet
channel_oi2 <- channel_oi |> stringr::str_split("_") |> unlist()
channel_Immune <- channel_oi2[1]
channel_Islet <- ifelse(channel_oi == "CD8a_INS", "INS", channel_oi2[2])

## Get parameters for the channel of interest
mu_S_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_S_Islet")
mu_S_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_S_Immune")
mu_N_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_N_Islet")
mu_N_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_N_Immune")
threshold_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("threshold_Immune")
threshold_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("threshold_Islet")
sigma2_N_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_N_Immune")
sigma2_N_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_N_Islet")
sigma2_S_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_S_Immune")
sigma2_S_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_S_Islet")

# Compressed channel:
ggplot(tibble(x = assay(sce_isl, "exprs")[channel_oi, ]), aes(x=x)) + geom_density() + 
  geom_vline(xintercept = mu_S_Islet + mu_N_Immune, color = "yellow") + # Islet signal + Immune noise
  geom_vline(xintercept = mu_S_Islet, color = "red") + # Islet signal (INS_S)
  geom_vline(xintercept = threshold_Islet, color = "green") + # threshold.
  geom_vline(xintercept = mu_N_Islet, color = "blue") +  # mean Islet noise (INS_N)
  ggtitle(channel_oi) # Density plot.

# Signal Islet Channel
ggplot(results_optim, aes(x = get(channel_Islet))) + geom_density() + 
    geom_vline(xintercept = mu_S_Islet, color = "red") +  # INS signal.
    geom_vline(xintercept = threshold_Islet, color = "green") + # threshold.
    geom_vline(xintercept = mu_N_Islet, color = "blue") + # mean Islet noise (INS_N).
    ggtitle("Islet")

# Noise Immune Channel -> this doesnt work great.
ggplot(results_optim, aes(x = get(channel_Immune))) + geom_density() + 
  geom_vline(xintercept = mu_N_Immune, color = "blue") + 
  ggtitle("Immune")

Assign the counts.

df <- as.data.frame(t(assay(sce_isl, "exprs")))

# Now add the decomposed channels.
# We do not yet remove the compressed channels.
df_decomposed <- df |> 
  mutate(CD44 = results_optim$CD44) |>
  mutate(CD8a = results_optim$CD8a) |> 
  mutate(CD4 = results_optim$CD4) |>
  mutate(CD3e = results_optim$CD3e) |>
  mutate(CD20 = results_optim$CD20) |>
  mutate(GCG = results_optim$GCG) |>
  mutate(INS = results_optim$INS) |>
  mutate(SST = results_optim$SST) |> 
  mutate(NKX6_1 = results_optim$NKX6_1) |>
  mutate(ProINS = results_optim$ProINS)

# Uncomrpessed panel now as rowData. 
fn_panel <- file.path(paths$folder_in, paste0("panel_", "Uncompressed", ".csv"))
panel <- read.csv(fn_panel, row.names = 1)

# Create a new SCE object. 
sce_isl_2 <- SpatialExperiment(
    assays = list(exprs = as.matrix(t(df_decomposed))), 
    colData = colData(sce_isl), 
    spatialCoords = spatialCoords(sce_isl),
    metadata = metadata(sce_isl))

Decompose for Immune Cells.

Perform the same operation for the Immune cells.

set.seed(222)
options <- furrr::furrr_options(seed = seed)

# Immune: 
results_immun <- purrr::pmap(full_coeffs, \(mu_S_Immune, mu_S_Islet, mu_N_Immune, mu_N_Islet, sigma2_S_Immune, sigma2_S_Islet, sigma2_N_Immune, sigma2_N_Islet, channel, threshold_Islet, threshold_Immune) {
    message(channel)
    results_immune <- furrr::future_map(1:length(colnames(sce_immune)), \(i) {
      # Get compressed value per cell.
      Immune_Islet_obs <- assay(sce_immune, "exprs")[channel, ][i]

      # Catch edge-case.
      if(Immune_Islet_obs == 0) {
        tibble(Immune = 0, Islet = 0)
      } 
      # Above threshold: signal intensity is CD20 signal + SST noise
      else if(Immune_Islet_obs > threshold_Immune){

        # Define optimization bounds.
        # Noise cannot be lower than 0, and within the signal threshold.
        lower_bound1 <- ifelse(mu_N_Islet - sigma2_N_Islet < 0, 0, mu_N_Islet - sigma2_N_Islet)
        upper_bound1 <- ifelse(mu_N_Islet + sigma2_N_Islet > threshold_Islet, threshold_Islet, mu_N_Islet + sigma2_N_Islet)
        interval <- c(lower_bound1, upper_bound1)

        result <- optimize(
          f = likelihood_n_islet_s_immune,
          interval = interval, 
          maximum = TRUE,
          Immune_Islet_obs = Immune_Islet_obs,
          mu_Immune = mu_S_Immune,
          sigma_Immune = sigma2_S_Immune,
          mu_N_Islet = mu_N_Islet,
          sigma_N_Islet = sigma2_N_Islet
        )

        # Extract the results
        N_Islet_opt <- result$maximum
        S_Immune_opt <- Immune_Islet_obs - N_Islet_opt
        tibble(Immune = S_Immune_opt, Islet = N_Islet_opt)
      }
      # Below threshold: signal intensity is SST noise  + CD20 noise
      else{
        # Define optimization bounds. Optimize on Immune noise.
        lower_bound1 <- ifelse(mu_N_Immune - sigma2_N_Immune < 0, 0, mu_N_Immune - sigma2_N_Immune)
        upper_bound1 <- ifelse(mu_N_Immune + sigma2_N_Immune > threshold_Immune, threshold_Immune, mu_N_Immune + sigma2_N_Immune)
        interval <- c(lower_bound1, upper_bound1)

        result <- optimize(
          f = likelihood_n_islet_n_immune,
          interval = interval,  # N_Immune cannot exceed the N_Immune + Sigma_Immune
          maximum = TRUE,
          Immune_Islet_obs = Immune_Islet_obs,
          mu_N_Immune = mu_N_Immune,
          sigma_N_Immune = sigma2_N_Immune,
          mu_N_Islet = mu_N_Islet,
          sigma_N_Islet = sigma2_N_Islet
        )
        # Extract the results
        N_Immune_opt <- result$maximum
        N_Islet_opt <- Immune_Islet_obs - N_Immune_opt
        tibble(Immune = N_Immune_opt, Islet = N_Islet_opt) 
      }
  }) |> bind_rows()
})
CD44_GCG
CD8a_INS
CD4_ProINS
CD3e_NKX6_1
CD20_SST
results_immune_optim <- results_immun |> bind_cols()
New names:
• `Immune` -> `Immune...1`
• `Islet` -> `Islet...2`
• `Immune` -> `Immune...3`
• `Islet` -> `Islet...4`
• `Immune` -> `Immune...5`
• `Islet` -> `Islet...6`
• `Immune` -> `Immune...7`
• `Islet` -> `Islet...8`
• `Immune` -> `Immune...9`
• `Islet` -> `Islet...10`
colnames(results_immune_optim) <- channels_uncompressed
results_immune_optim
# A tibble: 12,653 × 10
      CD44     GCG   CD8a      INS    CD4 ProINS   CD3e   NKX6_1    CD20     SST
     <dbl>   <dbl>  <dbl>    <dbl>  <dbl>  <dbl>  <dbl>    <dbl>   <dbl>   <dbl>
 1 0.00431 0.115   0.0167 -0.00851 0.0142 0.0503 0.0249 -0.0115  0.00284 0.0161 
 2 0.00431 0.00461 0.0167  0.0252  0.0166 0.0338 0.0243  0.0464  0.00297 0.00686
 3 0.339   0.0221  0.0167  0.0114  0.0142 0.0914 0.0249  0.00609 0.00297 0.00908
 4 0.295   0.0221  0.0167  0.0565  0.0142 0.0940 0.0249  0.00740 0.00288 0.0202 
 5 0.00431 0.149   0.0167  0.0260  0.0166 0.115  0.0249  0.0248  0.00297 0.00358
 6 0.00393 0.0156  0.0167  0.00962 0.0142 0.0599 0.0249  0.0183  0.00297 0.0114 
 7 0.00431 0.00996 0.0162  0.0290  0.0142 0.0880 0.0243  0.0396  0.00297 0.0156 
 8 0.00431 0.0370  0.0167  0.0185  0.0142 0.0867 0.0243  0.0359  0.00297 0.0393 
 9 0.00431 0.0402  0.0167  0.0539  0.0142 0.0741 0.0249  0.0113  0.00297 0.0150 
10 0.00431 0.0972  0.0167  0.0239  0.0142 0.0887 0.0243  0.0345  0.00297 0.0103 
# ℹ 12,643 more rows

Check the results. Note: given that the Immune Channel usually has less signal, the decomposition is more challenging. -> Unfortunately, especially CD20 and CD3e dont have great signal.

channel_oi <- "CD8a_INS"
# Split channels_oi into Immune and Islet
channel_oi2 <- channel_oi |> stringr::str_split("_") |> unlist()
channel_Immune <- channel_oi2[1]
channel_Islet <- ifelse(channel_oi == "CD3e_NKX6_1", "NKX6_1", channel_oi2[2])

## Get parameters for the channel of interest
mu_S_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_S_Islet")
mu_S_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_S_Immune")
mu_N_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_N_Islet")
mu_N_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_N_Immune")
threshold_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("threshold_Immune")
threshold_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("threshold_Islet")
sigma2_N_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_N_Immune")
sigma2_N_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_N_Islet")
sigma2_S_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_S_Immune")
sigma2_S_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_S_Islet")

# Compressed channel:
ggplot(tibble(x = assay(sce_immune, "exprs")[channel_oi, ]), aes(x=x)) + geom_density() + 
  geom_vline(xintercept = mu_S_Immune, color = "red") + 
  geom_vline(xintercept = threshold_Immune, color = "green") + 
  geom_vline(xintercept = mu_N_Immune, color = "blue")

# Noise Islet Channel
ggplot(results_immune_optim, aes(x = get(channel_Islet))) + geom_density() + 
    geom_vline(xintercept = mu_N_Islet, color = "blue") + 
    ggtitle("Islet")

# Signal Immune Channel
ggplot(results_immune_optim, aes(x = get(channel_Immune))) + geom_density() + 
  geom_vline(xintercept = mu_S_Immune, color = "red") +
  geom_vline(xintercept = threshold_Immune, color = "green") +
  geom_vline(xintercept = mu_N_Immune, color = "blue") + 
  ggtitle("Immune")

Assign the counts.

df <- as.data.frame(t(assay(sce_immune, "exprs")))

# Now add the decomposed channels.
# We do not yet remove the compressed channels.
df_decomposed <- df |> 
  mutate(CD44 = results_immune_optim$CD44) |>
  mutate(CD8a = results_immune_optim$CD8a) |> 
  mutate(CD4 = results_immune_optim$CD4) |>
  mutate(CD3e = results_immune_optim$CD3e) |>
  mutate(CD20 = results_immune_optim$CD20) |>
  mutate(GCG = results_immune_optim$GCG) |>
  mutate(INS = results_immune_optim$INS) |>
  mutate(SST = results_immune_optim$SST) |> 
  mutate(NKX6_1 = results_immune_optim$NKX6_1) |>
  mutate(ProINS = results_immune_optim$ProINS)

# Uncomrpessed panel now as rowData. 
fn_panel <- file.path(paths$folder_in, paste0("panel_", "Uncompressed", ".csv"))
panel <- read.csv(fn_panel, row.names = 1)

# Create a new SCE object.
sce_immune_2 <- SpatialExperiment(
    assays = list(exprs = as.matrix(t(df_decomposed))), 
    colData = colData(sce_immune), 
    spatialCoords = spatialCoords(sce_immune),
    metadata = metadata(sce_immune))
sce_immune_2
class: SpatialExperiment 
dim: 37 12653 
metadata(2): spillover_matrix subset
assays(1): exprs
rownames(37): H3 CD44_GCG ... NKX6_1 ProINS
rowData names(0):
colnames(12653): 6238_Compressed_001_12 6238_Compressed_001_17 ...
  6396_Compressed_030_1440 6396_Compressed_030_1444
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(0):

Exocrine + Stroma + Other.

sce_stroma$sample_id <- paste0(sce_stroma$sample_id, "_stroma")
sce_exo$sample_id <- paste0(sce_exo$sample_id, "_exo")
# sce_other$sample_id <- paste0(sce_other$sample_id, "_other")

sce_rest <- cbind(sce_stroma, sce_exo, sce_other)
test_df <- t(assay(sce_rest, "exprs")) |>  
  tibble::as_tibble(rownames = "cell_id") |> 
  select(all_of(channels_compressed), cell_id) |> 
  tidyr::pivot_longer(cols = channels_compressed, names_to = "channel", values_to = "value")
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(channels_compressed)

  # Now:
  data %>% select(all_of(channels_compressed))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
test2_df <- full_coeffs |> 
  mutate(Immune_Islet_N_frac = mu_N_Immune/mu_N_Islet) |> 
  mutate(Islet_Immune_N_frac = mu_N_Islet/mu_N_Immune) |>
  select(channel, Immune_Islet_N_frac, Islet_Immune_N_frac)

calc_df <- test_df |> 
  left_join(test2_df, by = "channel") |> 
  # change CD3_NKX6_1 to CD3_NKX6.1
  mutate(channel = ifelse(channel == "CD3_NKX6_1", "CD3_NKX6.1", channel)) |>
  tidyr::separate(channel, into = c("Immune_ch", "Islet_ch"), sep = "_") |> 
  mutate(Immune_N = Immune_Islet_N_frac * value) |>
  mutate(Islet_N = Islet_Immune_N_frac * value)
Warning: Expected 2 pieces. Additional pieces discarded in 105977 rows [4, 9, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64,
69, 74, 79, 84, 89, 94, 99, ...].
calc_df_2 <- calc_df |> 
  select(-c(Immune_Islet_N_frac, Islet_Immune_N_frac, value)) |>
  tidyr::pivot_wider(id_cols = cell_id, names_from = c(Immune_ch, Islet_ch), values_from = c(Immune_N, Islet_N))

results_rest_optim <- calc_df_2 |> 
  # rename from Islet_N_CD20_SST to SST
  # rename from Immune_N_CD20_SSt to CD20
  # rename from Islet_N_CD3_NKX6.1 to NKX6.1
  # rename from Immune_N_CD3_NKX6.1 to CD3
  dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD44_GCG", "CD44")) |>
  dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD8a_INS", "CD8a")) |>
  dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD4_ProINS", "CD4")) |>
  dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD3_NKX6.1", "CD3")) |>
  dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD20_SST", "CD20")) |>
  dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD20_SST", "SST")) |>
  dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD3_NKX6.1", "NKX6.1")) |>
  dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD8a_INS", "INS")) |>
  dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD4_ProINS", "ProINS")) |>
  dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD44_GCG", "GCG"))
set.seed(222)
results_rest  <- 
  purrr::pmap(full_coeffs, \(mu_S_Immune, mu_S_Islet, mu_N_Immune, mu_N_Islet, sigma2_S_Immune, sigma2_S_Islet, sigma2_N_Immune, sigma2_N_Islet, channel, threshold_Islet, threshold_Immune) {
    message(channel)
    
    # Optimization Interval.
    lower_bound1 <- ifelse(mu_N_Immune - sigma2_N_Immune < 0, 0, mu_N_Immune - sigma2_N_Immune)
    upper_bound1 <- ifelse(mu_N_Immune + sigma2_N_Immune > threshold_Immune, threshold_Immune, mu_N_Immune + sigma2_N_Immune)
    interval <- c(lower_bound1, upper_bound1)

    # Iterate over cells.
    furrr::future_map(1:length(colnames(sce_rest)), \(i) {
      Immune_Islet_obs <- assay(sce_rest, "exprs")[channel, ][i]
      
      result <- optimize(
        f = likelihood_n_islet_n_immune,
        interval = interval,  # N_Immune cannot exceed the N_Immune + Sigma_Immune
        maximum = TRUE,
        Immune_Islet_obs = Immune_Islet_obs,
        mu_N_Immune = mu_N_Immune,
        sigma_N_Immune = sigma2_N_Immune,
        mu_N_Islet = mu_N_Islet,
        sigma_N_Islet = sigma2_N_Islet
      )

      # Extract the results
      N_Immune_opt <- result$maximum
      N_Islet_opt <- Immune_Islet_obs - N_Immune_opt
      tibble(Immune = N_Immune_opt, Islet = N_Islet_opt)
    }) |> bind_rows()
})
CD44_GCG
CD8a_INS
CD4_ProINS
CD3e_NKX6_1
CD20_SST
results_rest_optim <- results_rest |> bind_cols()
New names:
• `Immune` -> `Immune...1`
• `Islet` -> `Islet...2`
• `Immune` -> `Immune...3`
• `Islet` -> `Islet...4`
• `Immune` -> `Immune...5`
• `Islet` -> `Islet...6`
• `Immune` -> `Immune...7`
• `Islet` -> `Islet...8`
• `Immune` -> `Immune...9`
• `Islet` -> `Islet...10`
colnames(results_rest_optim) <- channels_uncompressed
results_rest_optim
# A tibble: 105,977 × 10
      CD44     GCG   CD8a     INS    CD4 ProINS   CD3e   NKX6_1    CD20      SST
     <dbl>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>    <dbl>   <dbl>    <dbl>
 1 0.00431 0.0516  0.0167 0.00229 0.0166 0.123  0.0249 -6.10e-3 0.00297  0.00615
 2 0.00431 0.0590  0.0167 0.00816 0.0166 0.136  0.0243  4.18e-2 0.00297  0.00657
 3 0.00431 0.0272  0.0167 0.0251  0.0142 0.0644 0.0249 -8.95e-3 0.00297  0.00586
 4 0.00431 0.0381  0.0167 0.00550 0.0142 0.0692 0.0243  3.48e-2 0.00297  0.00651
 5 0.00431 0.00502 0.0162 0.0321  0.0142 0.0567 0.0249  6.37e-3 0.00297  0.00686
 6 0.00431 0.0640  0.0167 0.0186  0.0166 0.0150 0.0249  1.89e-2 0.00297  0.00724
 7 0.00431 0.00845 0.0162 0.0300  0.0166 0.0435 0.0249  5.18e-2 0.00297 -0.00297
 8 0.00431 0.146   0.0167 0.00549 0.0166 0.0128 0.0249  1.94e-2 0.00297  0.00206
 9 0.00393 0.0180  0.0162 0.0338  0.0166 0.130  0.0249  2.18e-2 0.00297  0.0226 
10 0.00431 0.0324  0.0167 0.0111  0.0166 0.0422 0.0249  1.96e-4 0.00297  0.0258 
# ℹ 105,967 more rows
df <- as.data.frame(t(assay(sce_rest, "exprs")))

# Now add the decomposed channels.
df_decomposed <- df |> 
  mutate(CD44 = results_rest_optim$CD44) |>
  mutate(CD8a = results_rest_optim$CD8a) |> 
  mutate(CD4 = results_rest_optim$CD4) |>
  mutate(CD3e = results_rest_optim$CD3e) |>
  mutate(CD20 = results_rest_optim$CD20) |>
  mutate(GCG = results_rest_optim$GCG) |>
  mutate(INS = results_rest_optim$INS) |>
  mutate(SST = results_rest_optim$SST) |> 
  mutate(NKX6_1 = results_rest_optim$NKX6_1) |>
  mutate(ProINS = results_rest_optim$ProINS)

# Uncomrpessed panel now as rowData. 
fn_panel <- file.path(paths$folder_in, paste0("panel_", "Uncompressed", ".csv"))
panel <- read.csv(fn_panel, row.names = 1)

# Create a new SCE object.
sce_rest_2 <- SpatialExperiment(
    assays = list(exprs = as.matrix(t(df_decomposed))), 
    colData = colData(sce_rest), 
    spatialCoords = spatialCoords(sce_rest),
    metadata = metadata(sce_rest))

sce_rest_2
class: SpatialExperiment 
dim: 37 105977 
metadata(6): spillover_matrix subset ... spillover_matrix subset
assays(1): exprs
rownames(37): H3 CD44_GCG ... NKX6_1 ProINS
rowData names(0):
colnames(105977): 6238_Compressed_001_5 6238_Compressed_001_19 ...
  6396_Compressed_030_1443 6396_Compressed_030_1445
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(0):

Combine + save SCEs

sce <- cbind(sce_isl_2, sce_immune_2, sce_rest_2)
'sample_id's are duplicated across 'SpatialExperiment' objects to cbind; appending sample indices.
fn <- file.path(paths$folder_out, paste0(paths$object_type, "_", paths$panel_type, "sce.rds"))
saveRDS(sce, fn)
# sce <- readRDS(fn); sce_rest_2 <- sce[, sce$cell_category %in% c("Exocrine", "Stroma", "Other")]; 
# sce_immune_2 <- sce[, sce$cell_category == "Immune"]; sce_isl_2 <- sce[, sce$cell_category == "Islet"]

Cell Type Annotation

We now perform cell type annotation in 3 steps. First, for the Islet cells using the decompressed data.

Dimensionality Reduction to check for uncompressed marker distribution

RUN UMAP.

sce_sub <- sce[, metadata(sce)[["subset"]]]

cur_assay <- "exprs"
dimred_name <- paste("UMAP", cur_assay, sep = "_")
print(dimred_name)
[1] "UMAP_exprs"
# Run UMAP on a cell subset
# Extract Counts. 
if (cur_assay %in% assayNames(sce_sub)) {
  counts <- t(assay(sce_sub, cur_assay))
}
# Run UMAP.
umap_model <- uwot::umap(counts, ret_model = TRUE)
# Extract Embedding.
cur_umap <- umap_model$embedding
colnames(cur_umap) <- c("UMAP1", "UMAP2")
rownames(cur_umap) <- rownames(counts)

reducedDim(sce_sub, dimred_name) <- cur_umap
cur_dimred <- "UMAP_exprs"
cur_dat <- scuttle::makePerCellDF(sce_sub, features = rownames(sce_sub),
                                  exprs_values = cur_assay) |>
                dplyr::select(c(all_of(channels_uncompressed), paste0(dimred_name, ".1"), 
                                                  paste0(dimred_name, ".2")))

cur_dat <- cur_dat |>
    tidyr::pivot_longer(cols = all_of(channels_uncompressed),
                        names_to = "channel",
                        values_to = cur_assay)
  
# Plot marker expression
p <- plot_dim_red_channels(cur_dat, dimred_name, cur_assay,
                           channels_uncompressed, force_points = TRUE)


print(p)

fn <- paste0(paste(today, "Categories_Decomposed", cur_dimred,
                    sep = "_"), ".png")
do.call(ggsave, c(list(fn, p), plotsave_param))

Islet Cells:

channels_islet <- c("GCG", "PCSK2", # Alpha-Cells:
                    "INS", "SST", "NKX6_1", "ProINS", "IAPP", # Beta-cells:
                    "SST", "PDX1", # Delta-cells. PDX1 also beta-cells
                    "PPY", # Gamma cells.
                    "SYP", "CD99", "Ecdh", "CD31" # General markers and included just in case.
                    )

sce_isl_3 <- sce[channels_islet, sce$cell_category == "Islet"]
sce_isl_3
class: SpatialExperiment 
dim: 14 20999 
metadata(10): spillover_matrix subset ... spillover_matrix subset
assays(1): exprs
rownames(14): GCG PCSK2 ... Ecdh CD31
rowData names(0):
colnames(20999): 6238_Compressed_001_28 6238_Compressed_001_293 ...
  6396_Compressed_030_1410 6396_Compressed_030_1439
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(1): sample_id

Clustering: PhenoGraph

clust_method <- c("Pheno1")
cur_assay <- "exprs"
# Number of nearest-neighbors
k <- 30

## 
clust_name <- paste(clust_method, cur_assay, sep = "_")
writeLines(c("\n", clust_name))


Pheno1_exprs
# Run Phenograph
if (!clust_name %in% colnames(colData(sce_isl_3))) {
  set.seed(seed)
  cur_pheno_annoy <- Rphenoannoy::Rphenoannoy(t(assay(sce_isl_3, cur_assay)), k = k)

  cur_pheno <- DataFrame(cur_pheno_annoy[[2]]$membership)
  colnames(cur_pheno) <- clust_name
  rownames(cur_pheno) <- colnames(assay(sce_isl_3, cur_assay))

  # Add Phenograph clusters to the colData of the SCE object
  # Cluster `0` is attributed to non-subsetted cells and not islet cells
  colData(sce_isl_3)[, clust_name] <- cur_pheno
  remove(cur_pheno)
}
Run Rphenograph starts:
  -Input data of 20999 rows and 14 columns
  -k is set to 30
  Finding nearest neighbors...DONE ~ 4.557 s
  Compute jaccard coefficient between nearest-neighbor sets...
Presorting knn...
presorting DONE ~ 0.947 s
  Start jaccard
DONE ~ 0.025 s
  Build undirected graph from the weighted links...DONE ~ 0.189 s
  Run louvain clustering on the graph ...DONE ~ 1.922 s
Run Rphenograph DONE, totally takes 6.69300000000021s.
  Return a community class
  -Modularity value: 0.8225092 
  -Number of clusters: 20

Heatmap of marker expression by cluster

Quickly check cluster abundances.

clust_freq <- table(colData(sce_isl_3)[[clust_name]])
clust_freq

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
 734  601  845 2298 1329  817 2746  872  699 1212  920 1298  901  623  904 1016 
  17   18   19   20 
1795 1186  168   35 
cur_method <- "Pheno1"
name_cur_assay <- "exprs"
cur_assay <- "exprs"

clust_name <- paste(cur_method, cur_assay, sep = "_")
message(clust_name)
Pheno1_exprs
message(name_cur_assay)
exprs
# Summarize the data
hm <- summarize_heatmap(sce_isl_3,
                        expr_values = name_cur_assay,
                        cluster_by = clust_name,
                        channels = channels_islet)

# Display the heatmap
fn <- paste0(paste(today, "Clusters", clust_name, "ISLET", "Heatmap",
                    sep = "_"), ".html")
heatmaply::heatmaply(
    heatmaply::normalize(hm), main = clust_name, 
    file = file.path(paths$folder_script, fn))

Beta: 6, 12, 7, 9, 14, 13, 10, 11 Delta: 2, 1 Gamma: 20 alpha-cells: 19, 17, 15, 18, 16, 3, 8 Other: 5, alpha/beta mixed: 4

13, 10, 11

clust_methods <- c("Pheno1")
clust_assay <- c("exprs")

clust_name <- paste(clust_methods, clust_assay, sep = "_")
print(clust_name)
[1] "Pheno1_exprs"
# Define which clusters correspond to which cell types
clust_other <- c(5, 4, 20)
clust_delta <- c(2, 1)
clust_beta <- c(6, 12, 7, 9, 14, 13, 10) 
clust_gamma <- c()
clust_alpha <- c(19, 17, 15, 18, 16, 3, 8, 11) 
all_clust <- sort(c(clust_other, clust_delta, clust_beta, clust_gamma,
                    clust_alpha))

if ((!length(unique(all_clust)) ==
     length(unique(colData(sce_isl_3)[, clust_name]))) ||
    any(duplicated(all_clust))) {
  stop("Recheck cluster attribution")
}

# Add cell types to the SCE object
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_beta,
                   "cell_type"] <- "Beta"
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_alpha,
                   "cell_type"] <- "Alpha"
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_delta,
                   "cell_type"] <- "Delta"
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_gamma,
                   "cell_type"] <- "Gamma"
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_other,
                   "cell_type"] <- "Other"
# Quick Check Cell Types.
colData(sce_isl_3) |> 
  as_tibble() |> 
  dplyr::count(case_id, cell_type, donor_type) |> 
  arrange(cell_type)
# A tibble: 12 × 4
   case_id cell_type donor_type      n
   <chr>   <chr>     <fct>       <int>
 1 6238    Alpha     NoDiabetes   1428
 2 6271    Alpha     NoDiabetes    420
 3 6396    Alpha     RecentOnset  5858
 4 6238    Beta      NoDiabetes   4527
 5 6271    Beta      NoDiabetes   2602
 6 6396    Beta      RecentOnset  1167
 7 6238    Delta     NoDiabetes    350
 8 6271    Delta     NoDiabetes     66
 9 6396    Delta     RecentOnset   919
10 6238    Other     NoDiabetes   1434
11 6271    Other     NoDiabetes    450
12 6396    Other     RecentOnset  1778

Visualize clusters on reduced dimensions

imgloader <- function(x, image_dir, image_names,
                     suffix_rem = "", suffix_add = "",
                     bit_depth = 16, type, ...) {
  require(cytomapper)

  image_list <- file.path(image_dir, image_names)

  # Test if the image list exist
  test_exist <- which(!file.exists(image_list))
  if (length(test_exist) > 0) {
    stop(c("The following images were not found:\n",
           paste(image_list[test_exist], collapse = "\n")))
  } else {
    # Load and scale the images
    images <- loadImages(image_list, ...)
    # images <- scaleImages(images, (2 ^ bit.depth) - 1)

    # Add image names to metadata
    mcols(images)$ImageName <- gsub(suffix_rem, "", names(images))
    mcols(images)$ImageName <- paste0(mcols(images)$ImageName, suffix_add)

    # Add channel names
    if (type == "stacks") {
      print("Loading image stacks")
      channelNames(images) <- rownames(x)
    }

    return(images)
  }
}

Select cluster(s) and assay to show

colData(sce_isl)$Pheno1_exprs <- sce_isl_3[, colnames(sce_isl)]$Pheno1_exprs

viz_clust <- c(9, 10, 5, 1, 3, 2, 6) # Select cluster(s) to visualize.
viz_method <- "Pheno1"
viz_assay <- "exprs"

Load images and masks

if (!is.null(viz_clust)) {
  nb_images <- 14
  image_extension <- ".tiff"
  clust_name <- paste(viz_method, viz_assay, sep = "_")

  # Subset the SCE
  sce_viz <- sce_isl[, colData(sce_isl)[[clust_name]] %in% viz_clust]

  # Select random image.
  set.seed(seed)
  image_sub <- sort(sample(
    unique(sce_viz$image_fullname),
  min(length(unique(sce_viz$image_fullname)), nb_images)))

  # Folders
  folder_images <- file.path(paths$folder_in, "img", paths$panel_type)
  folder_masks <- file.path(paths$folder_in, "masks_cells",
                            paths$panel_type, "whole-cell")

  # Load images and masks
  images <- imgloader(
    x = sce_viz,
    image_dir = folder_images,
    image_names = image_sub,
    type = "stacks"
  )

  masks <- imgloader(
    x = sce_viz,
    image_dir = folder_masks,
    image_names = image_sub,
    as.is = TRUE,
    type = "masks"
  )

  sce_viz <- sce_viz[, sce_viz$image_fullname %in% image_sub]
  sce_viz$ImageName <- gsub(image_extension, "", sce_viz$image_fullname)
}
Loading required package: cytomapper
Loading required package: EBImage

Attaching package: 'EBImage'
The following object is masked from 'package:purrr':

    transpose
The following object is masked from 'package:data.table':

    transpose
The following object is masked from 'package:SummarizedExperiment':

    resize
The following object is masked from 'package:Biobase':

    channel
The following objects are masked from 'package:GenomicRanges':

    resize, tile
The following objects are masked from 'package:IRanges':

    resize, tile

Attaching package: 'cytomapper'
The following objects are masked from 'package:Biobase':

    channelNames, channelNames<-
[1] "Loading image stacks"

FIXME: Make channel decomposition on pixel level aswell, such that we can visualize the decomposed channels.

Interactive with Cytoviewer

# Use cytoviewer with images, masks and object
library(cytoviewer)
if (!is.null(viz_clust)) {
  channels_view <- rownames(sce_isl)
  sub_images <- cytomapper::getChannels(images, channels_view)
  app <- cytoviewer(image = sub_images, 
                    mask = masks, 
                    object = sce_viz[channels_view, ], 
                    img_id = "ImageName", 
                    cell_id = "cell_number")

  if (interactive()) {
    shiny::runApp(app)
  }
}

Immune Cells:

channels_immune <- c("CD3e", "CD4", "CD8a", "CD45RA", "FOXP3", "CD45RO", # T-cells: 
                    "CD20", # B-cells. Likely almost absent.
                    "CD44", 
                     "CD68", # Macrophages.
                    "MPO") # Neutrophils.

sce_immune_3 <- sce_immune_2[channels_immune, ]

Clustering: PhenoGraph

## 
clust_name <- paste(clust_method, cur_assay, sep = "_")
writeLines(c("\n", clust_name))


Pheno1_exprs
# Run Phenograph
if (!clust_name %in% colnames(colData(sce_immune_3))) {
  set.seed(seed)
  cur_pheno_annoy <- Rphenoannoy::Rphenoannoy(t(assay(sce_immune_3, cur_assay)), k = k)

  cur_pheno <- DataFrame(cur_pheno_annoy[[2]]$membership)
  colnames(cur_pheno) <- clust_name
  rownames(cur_pheno) <- colnames(assay(sce_immune_3, cur_assay))

  # Add Phenograph clusters to the colData of the SCE object
  # Cluster `0` is attributed to non-subsetted cells and not islet cells
  colData(sce_immune_3)[, clust_name] <- cur_pheno
  remove(cur_pheno)
}
Run Rphenograph starts:
  -Input data of 12653 rows and 10 columns
  -k is set to 30
  Finding nearest neighbors...DONE ~ 2.902 s
  Compute jaccard coefficient between nearest-neighbor sets...
Presorting knn...
presorting DONE ~ 0.554 s
  Start jaccard
DONE ~ 0.014 s
  Build undirected graph from the weighted links...DONE ~ 0.123 s
  Run louvain clustering on the graph ...DONE ~ 0.64 s
Run Rphenograph DONE, totally takes 3.67900000000009s.
  Return a community class
  -Modularity value: 0.8723812 
  -Number of clusters: 23

Heatmap of marker expression by cluster

Quickly check cluster abundances.

clust_freq <- table(colData(sce_immune_3)[[clust_name]])
clust_freq

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
 426 1424  823  697  560  186  704 1030  989  389  341  442  424 1189  323  566 
  17   18   19   20   21   22   23 
 640  342  444  145  303  158  108 
cur_method <- "Pheno1"
name_cur_assay <- "exprs"
cur_assay <- "exprs"

clust_name <- paste(cur_method, cur_assay, sep = "_")
message(clust_name)
Pheno1_exprs
message(name_cur_assay)
exprs
# Summarize the data
hm <- summarize_heatmap(sce_immune_3,
                        expr_values = name_cur_assay,
                        cluster_by = clust_name,
                        channels = channels_immune)

# Display the heatmap
fn <- paste0(paste(today, "Clusters", clust_name, "ISLET", "Heatmap",
                    sep = "_"), ".html")
heatmaply::heatmaply(
    heatmaply::normalize(hm), main = clust_name, 
    file = file.path(paths$folder_script, fn))

T-CD8: 13 T-CD4: 22, NK: 6 Other: 5, 10, 7, 19, 11, 3, 14, 9, 2, 18, 12, 21 Neutrophil: 20, 15 Mos: 23, 17, 16, 1

clust_methods <- c("Pheno1")
clust_assay <- c("exprs")

clust_name <- paste(clust_methods, clust_assay, sep = "_")
print(clust_name)
[1] "Pheno1_exprs"
# Define which clusters correspond to which cell types
clust_tcd4 <- c(22)
clust_tcd8 <- c(13) # 14 confirmed.
clust_tdn <- c()
clust_other <- c(6, 5, 10, 7, 19, 11, 3, 14, 9, 2, 18, 12, 21) 
clust_B <- c()
clust_neutrophil <- c(20, 15) # confirmed.
clust_macrophage <- c(23, 17, 16, 1, 4, 8) # 15, 7 confirmed.

all_clust <- sort(c(clust_tcd4, clust_tcd8, clust_tdn, clust_B, clust_neutrophil,
                    clust_macrophage, clust_other))

if ((!length(unique(all_clust)) ==
     length(unique(colData(sce_immune_3)[, clust_name]))) ||
    any(duplicated(all_clust))) {
  stop("Recheck cluster attribution")
}

# Add cell types to the SCE object
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_tcd4,
                   "cell_type"] <- "T_CD4"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_tcd8,
                   "cell_type"] <- "T_CD8"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_B,
                   "cell_type"] <- "B"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_neutrophil,
                   "cell_type"] <- "Neutrophil"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_macrophage,
                   "cell_type"] <- "Myeloid"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_other,
                    "cell_type"] <- "Other" 
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_tdn,
                   "cell_type"] <- "T_DN"
# Quick Check Cell Types.
colData(sce_immune_3) |> 
  as_tibble() |> 
  dplyr::count(case_id, cell_type, donor_type) |> 
  arrange(cell_type)
# A tibble: 15 × 4
   case_id cell_type  donor_type      n
   <chr>   <chr>      <fct>       <int>
 1 6238    Myeloid    NoDiabetes    572
 2 6271    Myeloid    NoDiabetes    185
 3 6396    Myeloid    RecentOnset  2710
 4 6238    Neutrophil NoDiabetes    224
 5 6271    Neutrophil NoDiabetes     19
 6 6396    Neutrophil RecentOnset   225
 7 6238    Other      NoDiabetes   1703
 8 6271    Other      NoDiabetes    366
 9 6396    Other      RecentOnset  6067
10 6238    T_CD4      NoDiabetes     10
11 6271    T_CD4      NoDiabetes      3
12 6396    T_CD4      RecentOnset   145
13 6238    T_CD8      NoDiabetes     63
14 6271    T_CD8      NoDiabetes     46
15 6396    T_CD8      RecentOnset   315
# B-cells likely wrong annotaiton.

Visualize clusters on reduced dimensions

Select cluster(s) and assay to show

colData(sce_immune)$Pheno1_exprs <- sce_immune_3[, colnames(sce_immune)]$Pheno1_exprs

viz_clust <- c(1:19) # Select cluster(s) to visualize.
viz_method <- "Pheno1"
viz_assay <- "exprs"

Load images and masks

if (!is.null(viz_clust)) {
  nb_images <- 14
  image_extension <- ".tiff"
  clust_name <- paste(viz_method, viz_assay, sep = "_")

  # Subset the SCE
  sce_viz <- sce_immune[, colData(sce_immune)[[clust_name]] %in% viz_clust]

  # Select random image.
  set.seed(seed)
  image_sub <- sort(sample(
    unique(sce_viz$image_fullname),
  min(length(unique(sce_viz$image_fullname)), nb_images)))

  # Folders
  folder_images <- file.path(paths$folder_in, "img", paths$panel_type)
  folder_masks <- file.path(paths$folder_in, "masks_cells",
                            paths$panel_type, "whole-cell")

  # Load images and masks
  images <- imgloader(
    x = sce_viz,
    image_dir = folder_images,
    image_names = image_sub,
    type = "stacks"
  )

  masks <- imgloader(
    x = sce_viz,
    image_dir = folder_masks,
    image_names = image_sub,
    as.is = TRUE,
    type = "masks"
  )

  sce_viz <- sce_viz[, sce_viz$image_fullname %in% image_sub]
  sce_viz$ImageName <- gsub(image_extension, "", sce_viz$image_fullname)
}
[1] "Loading image stacks"

Interactive with Cytoviewer

# Use cytoviewer with images, masks and object
library(cytoviewer)
if (!is.null(viz_clust)) {
  channels_view <- rownames(sce_immune)
  sub_images <- cytomapper::getChannels(images, channels_view)
  app <- cytoviewer(image = sub_images,
                    mask = masks,
                    object = sce_viz[channels_view, ], 
                    img_id = "ImageName",
                    cell_id = "cell_number")

  if (interactive()) {
    shiny::runApp(app)
  }
}

Exocrine + Stroma Cells:

channels_exo <- c("Ecdh", 
                  "CD44", # Pan-Exo 
                  "CK19", 
                  "PDX1", # Ductal cells
                  "SMA", # Smooth Muscle; Activated Fibroblasts.
                  "AMY", # Acinar 
                  "CD31", # Endothelial)
                  "CA9") 

sce_exo_3 <- sce_rest_2[channels_exo, ]

Clustering: PhenoGraph

## 
clust_name <- paste(clust_method, cur_assay, sep = "_")
writeLines(c("\n", clust_name))


Pheno1_exprs
# Run Phenograph
if (!clust_name %in% colnames(colData(sce_exo_3))) {
  set.seed(seed)
  cur_pheno_annoy <- Rphenoannoy::Rphenoannoy(t(assay(sce_exo_3, cur_assay)), k = k)

  cur_pheno <- DataFrame(cur_pheno_annoy[[2]]$membership)
  colnames(cur_pheno) <- clust_name
  rownames(cur_pheno) <- colnames(assay(sce_exo_3, cur_assay))

  # Add Phenograph clusters to the colData of the SCE object
  # Cluster `0` is attributed to non-subsetted cells and not islet cells
  colData(sce_exo_3)[, clust_name] <- cur_pheno
  remove(cur_pheno)
}
Run Rphenograph starts:
  -Input data of 105977 rows and 8 columns
  -k is set to 30
  Finding nearest neighbors...DONE ~ 27.24 s
  Compute jaccard coefficient between nearest-neighbor sets...
Presorting knn...
presorting DONE ~ 4.8 s
  Start jaccard
DONE ~ 0.088 s
  Build undirected graph from the weighted links...DONE ~ 1.359 s
  Run louvain clustering on the graph ...DONE ~ 16.759 s
Run Rphenograph DONE, totally takes 45.445999999999s.
  Return a community class
  -Modularity value: 0.8709948 
  -Number of clusters: 31

Heatmap of marker expression by cluster

Quickly check cluster abundances.

clust_freq <- table(colData(sce_exo_3)[[clust_name]])
clust_freq

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
3547 6064 4225 3413 1932 4543 4243 3365 2925 7121 6531 3551 3476 1942 3807 3224 
  17   18   19   20   21   22   23   24   25   26   27   28   29   30   31 
4745  822  378 4501 4477 4721 2192 2630 2199 4853 3170 3008  923 2554  895 
cur_assay <- "exprs"
clust_name <- paste(cur_method, cur_assay, sep = "_")
message(clust_name)
Pheno1_exprs
message(cur_assay)
exprs
# Summarize the data
hm <- summarize_heatmap(sce_exo_3, expr_values = cur_assay,
                        cluster_by = clust_name,
                        channels = channels_exo)
hm
         Ecdh        CD44       CK19       PDX1        SMA        AMY
1  0.14983135 0.004307549 0.01328572 0.02533717 0.14234041 0.09162547
2  0.56939991 0.004307549 0.01476372 0.05925981 0.04105287 0.28725911
3  0.38839434 0.004307549 0.02490855 0.04982429 0.35212821 0.29453913
4  0.84819587 0.004307549 0.01653588 0.06872280 0.04058910 0.25147103
5  0.13005991 0.004307549 0.01735473 0.02324715 0.84666689 0.06942178
6  0.56782040 0.004307549 0.01687559 0.06050042 0.04176513 0.51704293
7  0.51647498 0.004307549 0.17747254 0.08697404 0.05301493 0.27610286
8  1.04273284 0.004307549 0.27874838 0.30375619 0.05810918 0.30637757
9  0.43874011 0.004307549 0.97234507 0.31414511 0.07639490 0.15743609
10 0.39658611 0.004307549 0.43288148 0.29333855 0.07877719 0.17672357
11 0.79485353 0.004307549 0.67719794 0.36241759 0.07182983 0.22950880
12 0.22304566 0.004307549 0.01660130 0.03584923 0.05630251 0.11341332
13 1.34244566 0.004307549 0.43861439 0.52446691 0.06858005 0.67975774
14 1.26597971 0.004307549 0.02556046 0.08819125 0.04157502 0.58500955
15 0.18296583 0.004307549 0.01474290 0.04211528 0.04210795 0.28617931
16 0.74971857 0.004307549 0.65987668 0.36204914 0.06291548 0.64949101
17 0.61371305 0.004307549 1.56690807 0.30642285 0.06908359 0.16358277
18 0.27775545 0.004307549 0.01213637 0.05373568 0.08947229 0.77244758
19 0.08853996 0.004307549 0.02137439 0.01808019 2.49523755 0.05411067
20 0.33681775 0.004307549 0.01656282 0.05257284 0.03956788 0.52691301
21 0.31682454 0.004307549 0.01418684 0.07040967 0.04209082 0.95343344
22 0.36677566 0.004307549 0.01535156 0.05435656 0.04041413 0.30964799
23 0.43284893 0.004307549 0.02214219 0.04407031 0.07239359 0.29185655
24 0.84049130 0.004307549 0.01509659 0.06838766 0.03518946 0.47387274
25 0.68400119 0.004307549 0.01730337 0.07789659 0.04102914 0.89825874
26 1.10559303 0.004307549 0.31976628 0.35378377 0.05720911 1.36074764
27 0.60419270 0.004307549 0.01411833 0.08822477 0.04136256 1.24577265
28 0.36906133 0.004307549 0.01393544 0.08254205 0.04317645 1.47406134
29 0.97007965 0.004307549 0.01613171 0.09898964 0.04055951 1.57125991
30 0.54368739 0.004307549 0.01730896 0.08841986 0.04546984 1.92461585
31 2.10722403 0.004307549 1.78213986 0.60652077 0.07925724 0.72872404
         CD31        CA9
1  0.15397941 0.01774506
2  0.02431137 0.02432417
3  0.02763228 0.04316357
4  0.02594964 0.02733137
5  0.04857285 0.02196356
6  0.02597827 0.02803190
7  0.02045637 0.03582682
8  0.01773869 0.03540149
9  0.01523137 0.04458512
10 0.01505070 0.03727501
11 0.01612048 0.04868868
12 0.01738729 0.02058095
13 0.02320205 0.04393136
14 0.03458049 0.04353523
15 0.01868236 0.02136079
16 0.01915734 0.04133103
17 0.01604831 0.06105087
18 0.22987077 0.02166004
19 0.08581290 0.01848027
20 0.02320499 0.02525009
21 0.03214560 0.02756279
22 0.02208901 0.02236137
23 0.02028548 0.23827565
24 0.02894051 0.02629785
25 0.03202350 0.03235486
26 0.02598490 0.04115513
27 0.04062754 0.03276893
28 0.04185566 0.03171415
29 0.04615143 0.03509572
30 0.04429425 0.03516834
31 0.03797643 0.15089170
# Display the heatmap
fn <- paste0(paste(today, "Clusters", clust_name, "EXOCRINE", "Heatmap",
                    sep = "_"), ".html")
heatmaply::heatmaply(
    hm, main = clust_name, 
    file = file.path(paths$folder_script, fn))
sce_rest
class: SpatialExperiment 
dim: 27 105977 
metadata(6): spillover_matrix subset ... spillover_matrix subset
assays(6): counts compcounts ... scaled fastMNN_case_id
rownames(27): H3 CD44_GCG ... DNA3 PPY
rowData names(10): channel metal ... shortname channel_name
colnames(105977): 6238_Compressed_001_5 6238_Compressed_001_19 ...
  6396_Compressed_030_1443 6396_Compressed_030_1445
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(1): sample_id

Other: 8, 4, 7, 23, 15, 12, 3 Acinar: 26, 29, 30, 28, 27, 13, 14, 24, 25, 21, 20, 22, 6, 2, Ductal: 31, 16, 11, 10, 8, 17 Fibro_SM: 5, 19 Endothelail: 18, 1

clust_methods <- c("Pheno1")
clust_assay <- c("exprs")

clust_name <- paste(clust_methods, clust_assay, sep = "_")
print(clust_name)
[1] "Pheno1_exprs"
# Define which clusters correspond to which cell types
#clust_other <- c(9, 20, 16, 11)
#clust_acinar <- c(10, 3, 5, 13, 15, 19, 6, 12, 17)
#clust_ductal <- c(2, 8, 18, 4, 7)
#clust_fibro_sm <- c(1)
#clust_endo <- c(14)

clust_other <- c(4, 7, 23, 15, 12, 3)
clust_acinar <- c(26, 29, 30, 28, 27, 13, 14, 24, 25, 21, 20, 22, 6, 2)
clust_ductal <- c(31, 16, 8:11, 17)
clust_fibro_sm <- c(5, 19)
clust_endo <- c(18, 1)

all_clust <- sort(c(clust_acinar, clust_ductal, clust_fibro_sm, clust_endo, clust_other))

if ((!length(unique(all_clust)) ==
     length(unique(colData(sce_exo_3)[, clust_name]))) ||
    any(duplicated(all_clust))) {
  stop("Recheck cluster attribution")
}

# Add cell types to the SCE object
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_other,
                   "cell_type"] <- "Other"
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_acinar,
                   "cell_type"] <- "Acinar"
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_ductal,
                   "cell_type"] <- "Ductal"
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_fibro_sm,
                   "cell_type"] <- "Fibro_SM"
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_endo,
                   "cell_type"] <- "Endothelial"
sce$cell_type <- "0"
sce[, colnames(sce_isl_3)]$cell_type <- sce_isl_3$cell_type
sce[, colnames(sce_immune_3)]$cell_type <- sce_immune_3$cell_type
sce[, colnames(sce_exo_3)]$cell_type <- sce_exo_3$cell_type
table(sce$cell_type)

     Acinar       Alpha        Beta       Delta      Ductal Endothelial 
      49061        7706        8296        1335       28806        4369 
   Fibro_SM     Myeloid  Neutrophil       Other       T_CD4       T_CD8 
       2310        3467         468       33229         158         424 

Save SCE

fn_spe <- file.path(paths$folder_out, paste0(paths$object_type, "_", paths$panel_type, ".rds"))
message(assayNames(sce))
exprs
saveRDS(sce, fn_spe)
# readRDS(fn_spe)

Visualize clusters

Load packages

suppressPackageStartupMessages(c(
  library(heatmaply),
  library(htmltools)#,
  #library(cytomapper)
))

Visualize clusters on reduced dimensions

Visualize attributed categories

assay_sel <- c("exprs")
names(assay_sel) <- c("exprs")

Plot cell types on reduced dimensions

sce_sub$cell_type <- sce[, metadata(spe)$subset]$cell_type
sce_sub <- sce_sub[, sce_sub$cell_type != "Other"]


# Prepare the data
cur_dat <- makePerCellDF(sce_sub, use_dimred = TRUE) |>
  arrange(case_id, donor_type)

# Plot
cur_dimred <- "UMAP"
dimred_name <- paste(cur_dimred, cur_assay, sep = "_")
clust_name <- "cell_type"

p <- plot_dim_red(cur_dat, dimred_name, clust_name,
                  sample = TRUE, size = 1, alpha = 1)
print(p)

fn <- paste0(paste(today, clust_name, cur_dimred,
                    sep = "_"), ".png")
do.call(ggsave, c(list(fn, p), plotsave_param))

Cell Type heatmap

clust_name <- "cell_type"
channels_heatmap <- rownames(sce)[!rownames(sce) %in% c("CD44", "PPY", "CA9", "DNA1", "DNA3", "H3", "PTPRN", "GLUT1", "FOXP3", "CD45RA")]

# Summarize the data
hm <- summarize_heatmap(sce[, metadata(sce)$subset],
                      expr_values = name_cur_assay,
                      cluster_by = clust_name,
                      channels = channels_heatmap)

# Display the heatmap
fn <- paste0(paste(today, clust_name, "Heatmap",
                  sep = "_"), ".html")
set.seed(1996)
heatmaply(
heatmaply::normalize(hm), main = clust_name,
file = file.path(paths$folder_script, fn))

Problem: - SST. CD45RA. Good: CD8a_INS. -


sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] htmltools_0.5.8.1           heatmaply_1.5.0            
 [3] viridis_0.6.5               viridisLite_0.4.2          
 [5] plotly_4.10.4               cytoviewer_1.2.0           
 [7] cytomapper_1.14.0           EBImage_4.44.0             
 [9] BiocParallel_1.36.0         patchwork_1.2.0            
[11] ranger_0.16.0               clustree_0.5.1             
[13] ggraph_2.2.1                Rphenoannoy_0.1.0          
[15] Matrix_1.6-5                igraph_2.1.4               
[17] scran_1.30.2                scater_1.30.1              
[19] scuttle_1.12.0              furrr_0.3.1                
[21] future_1.49.0               purrr_1.0.2                
[23] tictoc_1.2.1                data.table_1.17.2          
[25] SpatialExperiment_1.12.0    SingleCellExperiment_1.24.0
[27] SummarizedExperiment_1.32.0 Biobase_2.62.0             
[29] GenomicRanges_1.54.1        GenomeInfoDb_1.38.8        
[31] IRanges_2.36.0              S4Vectors_0.40.2           
[33] BiocGenerics_0.48.1         MatrixGenerics_1.14.0      
[35] matrixStats_1.5.0           dplyr_1.1.4                
[37] ggplot2_3.5.1              

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22          later_1.3.2              
  [3] bitops_1.0-9              svgPanZoom_0.3.4         
  [5] tibble_3.2.1              polyclip_1.10-7          
  [7] lifecycle_1.0.4           edgeR_4.0.16             
  [9] rprojroot_2.0.4           globals_0.18.0           
 [11] lattice_0.22-7            MASS_7.3-60              
 [13] crosstalk_1.2.1           dendextend_1.17.1        
 [15] magrittr_2.0.3            limma_3.58.1             
 [17] sass_0.4.9                rmarkdown_2.27           
 [19] jquerylib_0.1.4           yaml_2.3.10              
 [21] metapod_1.10.1            httpuv_1.6.15            
 [23] sp_2.1-4                  RColorBrewer_1.1-3       
 [25] abind_1.4-8               zlibbioc_1.48.2          
 [27] RCurl_1.98-1.14           tweenr_2.0.3             
 [29] git2r_0.33.0              seriation_1.5.5          
 [31] GenomeInfoDbData_1.2.11   ggrepel_0.9.5            
 [33] irlba_2.3.5.1             listenv_0.9.1            
 [35] terra_1.7-78              dqrng_0.4.1              
 [37] parallelly_1.44.0         svglite_2.1.3            
 [39] DelayedMatrixStats_1.24.0 codetools_0.2-20         
 [41] DelayedArray_0.28.0       ggforce_0.4.2            
 [43] tidyselect_1.2.1          raster_3.6-26            
 [45] farver_2.1.2              ScaledMatrix_1.10.0      
 [47] TSP_1.2-4                 webshot_0.5.5            
 [49] jsonlite_2.0.0            BiocNeighbors_1.20.2     
 [51] tidygraph_1.3.1           iterators_1.0.14         
 [53] systemfonts_1.1.0         foreach_1.5.2            
 [55] tools_4.3.1               ragg_1.3.2               
 [57] Rcpp_1.0.14               glue_1.8.0               
 [59] gridExtra_2.3             SparseArray_1.2.4        
 [61] xfun_0.52                 HDF5Array_1.30.1         
 [63] ca_0.71.1                 shinydashboard_0.7.2     
 [65] withr_3.0.2               fastmap_1.2.0            
 [67] rhdf5filters_1.14.1       bluster_1.12.0           
 [69] fansi_1.0.6               digest_0.6.37            
 [71] rsvd_1.0.5                mime_0.13                
 [73] R6_2.6.1                  textshaping_0.4.0        
 [75] colorspace_2.1-1          jpeg_0.1-11              
 [77] utf8_1.2.5                tidyr_1.3.1              
 [79] generics_0.1.4            graphlayouts_1.1.1       
 [81] httr_1.4.7                htmlwidgets_1.6.4        
 [83] S4Arrays_1.2.1            whisker_0.4.1            
 [85] uwot_0.2.2                pkgconfig_2.0.3          
 [87] gtable_0.3.6              registry_0.5-1           
 [89] workflowr_1.7.1           XVector_0.42.0           
 [91] fftwtools_0.9-11          scales_1.3.0             
 [93] png_0.1-8                 knitr_1.47               
 [95] reshape2_1.4.4            rjson_0.2.23             
 [97] rhdf5_2.46.1              cachem_1.1.0             
 [99] stringr_1.5.1             shinycssloaders_1.0.0    
[101] miniUI_0.1.1.1            vipor_0.4.7              
[103] pillar_1.9.0              grid_4.3.1               
[105] vctrs_0.6.5               RANN_2.6.2               
[107] promises_1.3.0            BiocSingular_1.18.0      
[109] beachmat_2.18.1           xtable_1.8-4             
[111] cluster_2.1.8.1           archive_1.1.12           
[113] beeswarm_0.4.0            evaluate_1.0.3           
[115] magick_2.8.3              cli_3.6.5                
[117] locfit_1.5-9.9            compiler_4.3.1           
[119] rlang_1.1.6               crayon_1.5.3             
[121] labeling_0.4.3            mclust_6.1.1             
[123] plyr_1.8.9                fs_1.6.6                 
[125] ggbeeswarm_0.7.2          stringi_1.8.7            
[127] nnls_1.6                  assertthat_0.2.1         
[129] munsell_0.5.1             lazyeval_0.2.2           
[131] tiff_0.1-12               colourpicker_1.3.0       
[133] sparseMatrixStats_1.14.0  Rhdf5lib_1.24.2          
[135] shiny_1.8.1.1             statmod_1.5.0            
[137] highr_0.11                fontawesome_0.5.3        
[139] memoise_2.0.1             bslib_0.7.0